Prediction of lateral lymph node metastasis in rectal cancer patients based on MRI using clinical, deep transfer learning, radiomic, and fusion models

Introduction Lateral lymph node (LLN) metastasis in rectal cancer significantly affects patient treatment and prognosis. This study aimed to comprehensively compare the performance of various predictive models in predicting LLN metastasis. Methods In this retrospective study, data from 152 rectal cancer patients who underwent lateral lymph node (LLN) dissection were collected. The cohort was divided into a training set (n=86) from Tianjin Union Medical Center (TUMC), and two testing cohorts: testing cohort (TUMC) (n=37) and testing cohort from Gansu Provincial Hospital (GSPH) (n=29). A clinical model was established using clinical data; deep transfer learning models and radiomics models were developed using MRI images of the primary tumor (PT) and largest short-axis LLN (LLLN), visible LLN (VLLN) areas, along with a fusion model that integrates features from both deep transfer learning and radiomics. The diagnostic value of these models for LLN metastasis was analyzed based on postoperative LLN pathology. Results Models based on LLLN image information generally outperformed those based on PT image information. Rradiomics models based on LLLN demonstrated improved robustness on external testing cohorts compared to those based on VLLN. Specifically, the radiomics model based on LLLN imaging achieved an AUC of 0.741 in the testing cohort (TUMC) and 0.713 in the testing cohort (GSPH) with the extra trees algorithm. Conclusion Data from LLLN is a more reliable basis for predicting LLN metastasis in rectal cancer patients with suspicious LLN metastasis than data from PT. Among models performing adequately on the internal test set, all showed declines on the external test set, with LLLN_Rad_Models being less affected by scanning parameters and data sources.

Introduction: Lateral lymph node (LLN) metastasis in rectal cancer significantly affects patient treatment and prognosis.This study aimed to comprehensively compare the performance of various predictive models in predicting LLN metastasis.

Methods:
In this retrospective study, data from 152 rectal cancer patients who underwent lateral lymph node (LLN) dissection were collected.The cohort was divided into a training set (n=86) from Tianjin Union Medical Center (TUMC), and two testing cohorts: testing cohort (TUMC) (n=37) and testing cohort from Gansu Provincial Hospital (GSPH) (n=29).A clinical model was established using clinical data; deep transfer learning models and radiomics models were developed using MRI images of the primary tumor (PT) and largest short-axis LLN (LLLN), visible LLN (VLLN) areas, along with a fusion model that integrates features from both deep transfer learning and radiomics.The diagnostic value of these models for LLN metastasis was analyzed based on postoperative LLN pathology.
Results: Models based on LLLN image information generally outperformed those based on PT image information.Rradiomics models based on LLLN demonstrated improved robustness on external testing cohorts compared to those based on VLLN.Specifically, the radiomics model based on LLLN imaging achieved an AUC of 0.741 in the testing cohort (TUMC) and 0.713 in the testing cohort (GSPH) with the extra trees algorithm.

Introduction
Lateral lymph node (LLN) metastasis is a significant route of metastasis for mid-and low rectal cancers, a 20.1% rate of metastases (1).Current treatment strategies for suspected LLN metastasis include: 1. total mesorectal excision (TME) after neoadjuvant chemoradiotherapy (nCRT); 2. TME combined with lateral lymph node dissection (LLND); and 3. TME combined with LLND after nCRT (2,3).Accurate diagnosis of LLN metastasis is crucial for determining the appropriate treatment strategy.Preoperative pathological or cytological evidence of LLNs is difficult to obtain; hence, the diagnosis of LLN metastasis primarily relies on imaging studies.The short-axis diameter of the lymph node is the most critical parameter for assessing the presence of metastasis (4).Immune responses induced by tumors can also lead to lymph node enlargement, which does not necessarily indicate tumor cell metastasis.In contrast, nonmetastatic lymph node enlargement is an indicator of better long-term prognosis in colorectal cancer (CRC) patients (5).Currently, commonly used imaging methods for diagnosing lymph node metastasis include MRI, CT, positron emission tomography (PET)/CT, and endorectal ultrasound.These imaging techniques demonstrate relatively low sensitivity and specificity in determining the nature of lymph nodes (6)(7)(8).
Over the past decade, the field of radiomics has established itself as an important technique in quantitative image analysis.Radiomics involves the extraction of a large number of quantitative features from medical images using sophisticated data characterisation algorithms (9).These features can then be used to build predictive models of clinical outcomes, improving the accuracy of medical diagnoses and treatment plans (10).While radiomics focuses on pre-defined features extracted from images, deep learning approaches, particularly deep transfer learning models, have gained popularity for their ability to automatically learn features from data.Deep transfer learning uses pre-trained neural networks that can be fine-tuned to specific medical imaging tasks, reducing the need for large labelled datasets (11).Radiomics and deep transfer learning technologies have demonstrated exceptional capabilities in disease diagnosis, molecular typing, and predicting treatment responses (12).Studies have shown that in the diagnosis of rectal cancer lymph node metastasis, radiomics models exhibit greater diagnostic efficacy than traditional imaging methods (13).This approach aimed to explore optimal methods for constructing machine learning diagnostic models for detecting LLN metastasis in rectal cancer patients suspected LLN metastasis.

Study cohort
In this study, data from 152 rectal cancer patients whose MRIdocumented LLNs exceeded 5 mm in short-axis diameter were retrospectively collected, all of whom had undergone LLND.A clinical diagnostic model was constructed, along with seven other models developed specifically for LLNs and primary tumor (PT).Three types of models were developed for both largest short-axis LLN (LLLN)and PT: a deep transfer learning (DTL) diagnostic model, a radiomic model, and a fusion model that integrates features from both DTL and radiomics.Additionally, a radiomic model was developed based on visible LLN (VLLN).Written informed consent was waived in this retrospective study.The study protocol was approved by the Tianjin Union Medical Center (TUMC)'s Ethics Committee (Approval No. 2022-C23) and Gansu Provincial Hospital (GSPH)'s Ethics Committee (Approval No. 2024-243).Clinical and imaging data of rectal cancer patients who met the following criteria were collected from June 2017 to May 2024.The inclusion criteria were as follows: 1. Patients who underwent LLND surgery at the same time as TME surgery and who had pathologically confirmed rectal cancer; 2. Patients with pelvic MR images and LLNs with short-axis diameters exceeding 5 mm on MRI, as assessed by the surgical team preoperatively; The exclusion criteria were as follows: 1. Patients without T2WI data.2. Patients without complete clinical and pathological information.3.Patients for which the LLNs were not visible in horizontal T2WI images because they were outside the field of view of the scan, even though LLNs greater than 5 mm in the short axis could be detected in sagittal or coronal positions.4. In those who received nCRT, induction neoadjuvant chemotherapy or consolidation neoadjuvant chemotherapy before surgery, those with pathologically negative LN were excluded to account for potential curative treatment of nCRT and the subsequent effect on modeling.
According to the postoperative pathological results of the LLNs in the patients, the patients were divided into two groups: the LLN metastasis group, consisting of patients with one or more pathologically positive LLNs, and the non-LLN metastasis group, consisting of patients with zero pathologically positive LLNs.Patients from TUMC were randomly divided at a 7:3 ratio into a training cohort (TUMC) (n=86) and a testing cohort (TUMC) (n=37).Patients from GSPH were designated as testing cohort (GSPH) (n=29).

Region of interest segmentation
We obtained MR-T2W images of the pelvis at admission from the image archiving and communication system at Tianjin Union Medical Center.The horizontal MR-T2W images obtained from the patient cohort were exported to the 3D Slicer program (v.5.2.2) for ROI segmentation.A radiologist with more than five years of experience in the field utilized this software to accurately delineate the boundaries of the PT and the VLLN.

Radiomics feature extraction
In our study, we utilized PyRadiomics to extract a total of 1,198 radiomics features from the PT and the each VLLN.The extracted features include first-order features, shape-based features, and various texture features categorized into a gray-level co-occurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM).The proportions of each category are illustrated in Supplementary Figure S1.The detailed parameters used for radiomic feature extraction are described in the Supplementary Materials and can also be found on the PyRadiomics website (https://pyradiomics.readthedocs.io/en/latest/).The configuration file for feature extraction is provided in the Supplementary File.Radiomics features from PT were used to construct PT_Rad_Models (radiomics models based on primary tumor).Radiomics features from the LLLN were used to construct LLLN_Rad_Models (radiomics models based on largest short-axis lateral lymph node).The maximum, minimum, mean, median value (when the number of VLLN is even, the median value is equal to the mean), and standard deviation of each feature of all VLLN of each participant were recorded, resulting in a total of 5990 radiomics features obtained from each patient for VLLN_Rad_Models (radiomics models based on all visible lateral lymph nodes).

Radiomics feature selection and model construction
The radiomics features were standardized using z score normalization.We also conducted Mann−Whitney U tests and feature screening for all radiomic features.Only radiomic features with p values < 0.05 were retained.To handle strong correlations between features (Spearman correlation coefficient ≥ 0.9), we employed a greedy recursive feature deletion strategy for feature filtering.This strategy entails iteratively removing the feature with the highest redundancy within the current feature set until the current set no longer contains features with a correlation coefficient greater than 0.9.To further refine the features, multivariate least absolute shrinkage and selection operator (LASSO) regression was employed.After LASSO feature selection, we conducted supervised learning using eight diverse machine learning classifiers, including random forest (RF), k-nearest neighbor (KNN), logistic regression (LR), multilayer perceptron (MLP), support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and ExtraTrees.Twenty-four models were constructed, with eight PT_Rad_Models and eight LLLN_Rad_Models, and eight VLLN_Rad_Models.

Clinical model construction
The clinical characteristics and radiological features in Table 1 were used to construct the clinical model.These features were standardized using z score normalization.Next, feature selection was performed using t tests and chi-square tests (P < 0.10) to screen for clinical risk factors for LLN metastasis in the training set, followed by training eight diverse machine learning classifiers.

DTL model development and feature extraction
For PT and LLLN, the layer with the largest ROI area was selected.In that layer, the ROI area with the smallest bounding rectangle was saved as a PNG image.The ResNet18 network was pretrained using the ImageNet dataset, and transfer learning was subsequently performed on the training set.ImageNet is a largescale image database that contains millions of labeled images across thousands of categories.ImageNet-based transfer learning has been used in many medical studies.We employed a global fine-tuning strategy to update the parameters, thereby adapting ResNet18 for the prediction of LLN metastasis.The learning rate was set to 0.005, the number of epochs was set to 50, and the Adam optimizer was used to update the parameters.Two models were constructed: PT_DTL_ResNet18 (deep transfer learning on primary tumor using ResNet18) and LLLN_DTL_ResNet18 (deep transfer learning on largest short-axis lateral lymph node using ResNet18).The trained ResNet18 could be used to predict the probability of LLN metastasis for each rectangular image.
After completing the training of ResNet18, we utilized ResNet18 to extract 512 deep learning features of each patch from the penultimate average pooling layer in ResNet18.

Construction of the fusion model
This study employed feature-level fusion strategies to establish a fusion model.Feature-level fusion, also known as early fusion, involves connecting all features from different modalities into a single feature vector.The radiomics features of the primary tumor were extracted using PyRadiomics, while the deep learning (DL) features were obtained through ResNet18, as described above.These DL and radiomics features were standardized using z score normalization.Subsequently, U tests, Spearman correlation analyses, and LASSO analyses were performed to select the features, followed by training eight diverse machine learning classifiers.Sixteen models were constructed, with eight PT_Fusion_Models (the models combine radiomics and deep transfer learning features based on the primary tumor) and eight LLLN_Fusion_Models (the models combine radiomics and deep transfer learning features based on the largest short-axis lateral lymph node).

Model validation and comparison
After construction, the prediction model was validated in the testing cohort (TUMC) and the testing cohort (GSPH).The sensitivity, specificity, precision, and F1 score were measured to evaluate the diagnostic accuracy.Additionally, a confusion matrix and a waterfall figure were used for further comparison.Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were generated to evaluate the discrimination performance of the prediction model.Decision curve analysis (DCA) was performed to assess the clinical utility and net benefit of the model.The flowchart of the study is illustrated in Figure 1.

Baseline characteristics and clinical model analysis
This study involved a cohort of 152 patients with a mean age of 59.09 years ( ± 11.6 years).The sex distribution revealed that 63.8% of The workflow of the clinical, radiomic, DTL (ResNet18), and fusion (radiomics and DTL) models.(DTL, deep transfer learning; PT, primary tumor; LLLN, largest short-axis lateral lymph node; VLLN, visible lateral lymph nodes; PT_Rad_Models, radiomics model based on primary tumor; PT_Fusion_Models, the models combine radiomics and deep transfer learning features based on the primary tumor; LLLN_Rad_Models, the radiomics model based on largest short-axis lateral lymph node; VLLN_Rad_Models, the radiomics model based on all visible lateral lymph nodes; LLLN_Fusion_Models, the models combine radiomics and deep transfer learning features based on largest short-axis lateral lymph node; PT_DTL_ResNet18, deep transfer learning on primary tumor using ResNet18; LLLN_DTL_ResNet18, deep transfer learning on largest short-axis lateral lymph node using ResNet18; ROC, receiver operating characteristic; DCA, decision curve analysis; HIS, Hospital Information System).

Feature selection 3.2.1 Primary tumor radiomic features
We ultimately identified 8 key radiomic features of the primary tumor (PT) of the 1,198 radiomic features (Figure 2A).These features were selected specifically for constructing the PT_Rad_Models.

Primary tumor fusion features
30 Fusion features for the primary tumor, which included 10 key radiomic features out of 1,198 radiomic features and 20 key deep learning features out of 512 deep learning features (Figure 2B).These features were utilized to develop the PT_Fusion_Models.

LLLN radiomic features
We ultimately identified 18 key radiomic features of the LLLN out of 1,198 radiomic features (Figure 3A).These features were selected specifically for constructing the LLLN_Rad_Models.

VLLN radiomic features
We ultimately identified 16 key radiomic features of the VLLN out of 5990 radiomic features (Figure 3B).These features were selected specifically for constructing the LLLN_Rad_Models.features out of 512 deep learning features (Figure 3C).These features were utilized to develop the LLLN_Fusion_Models.

LLLN fusion features
The complete set of feature information is available in Supplementary Materials 2.

PT_Rad_models
Figure 4A shows the ROC analysis of radiomic features by different models in the training cohort and testing cohort.For the training cohort (TUMC), the AUC values for the LR, SVM, KNN, Random Forest, Extra Trees, XGBoost, LightGBM, and MLP models were 0.816, 0.897, 0.803, 0.929, 0.845, 0.989, 0.859, and 0.843, respectively.For the testing cohort (TUMC), the AUC values were 0.574, 0.670, 0.530, 0.566, 0.570, 0.564, 0.589, and 0.604, respectively.For the testing cohort (GSPH), the AUC values were 0.532, 0.637, 0.524, 0.445, 0.521, 0.503, 0.495, and 0.642, respectively.Detailed statistical evaluations of the PT_Rad_Models are presented in Supplementary Table S2.For a comparison of accuracy across different algorithms in the PT_Rad_Models, see Supplementary Figure S6.The confusion matrices for the training and test cohorts of the PT_Rad_Models are shown in Supplementary Figure S12.Waterfall plots for the training and test cohorts in the PT_Rad_Models can be found in Supplementary Figure S18.The results of the DCA for the training and test cohorts of the PT_Rad_Models are presented in Supplementary Figure S24.

LLLN_Rad_models
Figure 5A shows the ROC analysis of radiomic features by different models in the training cohort and testing cohort.For the training cohort (TUMC), the AUC values for the LR, SVM, KNN, random forest, Extra Trees, XGBoost, LightGBM, and MLP models were 0.969, 0.976, 0.926, 0.983, 0.942, 1.000, 0.957, and 0.965, respectively.For the testing cohort (TUMC), the AUC values were 0.744, 0.738, 0.662, 0.723, 0.741, 0.698, 0.743, and 0.807, respectively.For the testing cohort (GSPH), the AUC values were 0.526, 0.642, 0.621, 0.629, 0.713, 0.684, 0.555, and 0.553, respectively.Detailed statistical evaluations of the LLLN_Rad_Models are presented in Supplementary Table S4.For a comparison of accuracy across different algorithms in the LLLN_Rad_Models, see Supplementary Figure S8.The confusion matrices for the training and test cohorts of the LLLN_Rad_Models are shown in Supplementary Figure S14.Waterfall plots for the training and test cohorts in the LLLN_Rad_Models can be found in Supplementary Figure S20.DCA for the training and test cohorts of the LLLN_Rad_Models is presented in Supplementary Figure S26.

VLLN_Rad_models
Figure 5B shows the ROC analysis of radiomic features by different models in the training cohort and testing cohort.For the training cohort (TUMC), the AUC values for the LR, SVM, KNN, random forest, Extra Trees, XGBoost, LightGBM, and MLP models were 0.963, 0.969, 0.958, 0.975, 0.945, 1.000, 0.951, and 0.954, respectively.For the testing cohort (TUMC), the AUC values were 0.792, 0.762, 0.766, 0.740, 0.793, 0.728, 0.753, and 0.801, respectively.For the testing cohort (GSPH), the AUC values were 0.516, 0.589, 0.505, 0.463, 0.445, 0.566, 0.584, and 0.505, respectively.Detailed statistical evaluations of the VLLN_Rad_Models are presented in Supplementary Table S5.For a comparison of accuracy across different algorithms in the VLLN_Rad_Models, see Supplementary Figure S9.The confusion matrices for the training and test cohorts of the VLLN_Rad_Models are shown in Supplementary Figure S15.Waterfall plots for the training and test cohorts in the VLLN_Rad_Models can be found in Supplementary Figure S21.DCA for the training and test cohorts of the LLLN_Rad_Models is presented in Supplementary Figure S27.
In terms of AUC, the LLLN_Rad_Models or VLLN_Rad_Models consistently performed better in the testing cohort (TUMC) than did the PT_Rad_Models across all models.In the testing cohort (GSPH), the classification ability of the VLLN_Rad_Models substantially decreased in terms of AUC, and the LLLN_Rad_Models also decreased, but to a lesser extent compared to the VLLN_Rad_Models.(Figure 6, Supplementary Figure S29).

PT_Fusion_Models
Figure 4B shows the ROC analysis of radiomic features by different models in the training cohort and testing cohort.For the training cohort (TUMC), the AUC values for the LR, SVM, KNN, random forest, Extra Trees, XGBoost, LightGBM, and MLP models were 0.991, 0.999, 0.961, 0.991, 0.976, 1.000, 0.973, and 0.992, respectively.For the testing cohort (TUMC), the AUC values were 0.557, 0.601, 0.565, 0.545, 0.574, 0.536, 0.574, and 0.565, respectively.For the testing cohort (GSPH), the AUC values were 0.495, 0.505, 0.526, 0.521, 0.568, 0.574, 0.476, and 0.458, respectively.Detailed statistical evaluations of the PT_Fusion_Models are presented in Supplementary Table S3.For a comparison of accuracy across different algorithms in the PT_Fusion_Models, see Supplementary Figure S7.The confusion matrices for the training and test cohorts of the PT_Fusion_Models are shown in Supplementary Figure S13.Waterfall plots for the training and test cohorts in the PT_Fusion_Models can be found in Supplementary Figure S19.The results of the DCA for the training and test cohorts of the PT_Fusion_Models are presented in Supplementary Figure S25.
PT_DTL_ResNet18 was 0.326, while LLLN_DTL_ResNet18 still retained some classification ability with an AUC of 0.621.The possible reasons for this discrepancy could be the differences in scanning parameters between the two hospitals, leading to poor performance in the testing cohorts (GSPH).PT images are more susceptible to scanning parameter variations due to their dependency on imaging quality and contrast settings, whereas LLLN images provide more consistent features and are less affected by such variations.
In the training set, LLLN_Fusion_Models exhibited high AUC values, indicating a good fit to the training data.In contrast, LLLN_Rad_Models have lower AUC in training.However, in the testing set (TUMC), the LLLN_Fusion_Models did not perform better than the LLLN_Rad_Models for all algorithms (Supplementary Figure S32).This suggests that within the methodological framework used in this study, a richer feature pool does not enhance the models' predictive efficacy on new datasets.The integration of a larger number of features might lead to models that perform well on training data but fail to generalize to new, unseen data.This can result from models capturing noise rather than underlying patterns.
Many machine learning studies on lymph node metastasis diagnosis in rectal cancer do not differentiate between mesorectal and LLNs (18)(19)(20)(21)(22)(23).As a result, the models can only predict whether lymph node metastasis is present in patients but cannot determine whether metastasis occurs in the mesorectum or LLNs.This limitation restricts the clinical applicability of the models.There are a few focused studies attempting to address this issue.Yan H and colleagues constructed a diagnostic model for LLN metastasis based on clinical risk factors and radiomic features from MR images of primary rectal tumors and LLNs, achieving an AUC of 0.836 (24).Similarly, Yang H and others developed a model based on radiomic features from MR and CT images of LLNs combined with clinical risk factors, achieving an AUC of 0.936 (25).These studies segmented all VLLL, extracting 112 radiomic features from each VLLL.The maximum, minimum, mean, median, and standard deviation of each feature across all visible LLNs of each participant were recorded and analyzed using logistic regression.These studies did not perform external validation.Our research increased the number of extracted features to 1198, incorporated fusion models and DTL models, and included external testing cohorts.In terms of AUC, our findings show that while VLLN_Rad_Models outperformed LLLN_Rad_Models in the internal testing cohort (TUMC), their classification ability markedly declined in the external testing cohort (GSPH), making them less effective than LLLN_Rad_Models.This may be because the features of a single largest lateral lymph node are more stable and less affected by variations in scanning parameters and image quality.Handling features of a single lymph node also simplifies the model, reducing the risk of overfitting.
There are several limitations to this study.First, the relatively small sample size may limit the robustness of the results.Further multicenter studies with larger sample sizes are required to improve the diagnostic accuracy of the model and to validate its generalizability in predicting the pathological characteristics of LLN in rectal cancer patients prior to nCRT or surgery.Second, this study included patients who received nCRT before LLND, and only those with postoperative LLN pathology confirmed as positive were included.It is assumed that LLN metastasis occurred before nCRT and did not develop during treatment.This assumption might lead to bias in the results, as it does not consider the possibility that LLN metastasis could occur during nCRT, thereby affecting the accuracy and applicability of the predictive model based on pre-nCRT data.

Conclusion
This study demonstrated the diagnostic potential of radiomic, deep transfer learning, and fusion models for predicting LLN metastasis in rectal cancer patients.The use of LLLN data proved to be a more reliable basis for model prediction than PT data.While the fusion models showed high AUC values in the training set, they did not outperform the radiomic models when applied to unseen data.Among models performing adequately on the internal test set, all showed declines on the external test set, with LLLN_Rad_Models for diagnosing LLN metastasis being less affected by scanning parameters and data sources compared to other models.

Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article.This research was funded by the Tianjin Key Medical Discipline (Specialty) Construction Project (NO:TJYXZDXK-044A) and the hospital level scientific research fund of Tianjin Union Medical Center Center (NO:2022GCXK001).

4 ROC 5 ROC
FIGURE 4    ROC curves for the ability of the radiomics models and fusion (radiomics and DTL) models to predict LLN metastasis in the training and validation cohorts.(A) Radiomics models based on PT. (B) Fusion models based on PT. (PT, primary tumor; PT_Rad_Models, radiomics models based on primary tumor; PT_Fusion_Models, the models combine radiomics and deep transfer learning features based on the primary tumor; ROC, receiver operating characteristic; RF, random forest; KNN, k-nearest neighbor; LR, logistic regression; MLP, multilayer perceptron; SVM, support vector machine; XGBoost, extreme gradient boosting; LightGBM, light gradient boosting machine; TUMC, Tianjin Union Medical Center; GSPH, Gansu Provincial Hospital).

TABLE 1
Characteristics of patients in the training and test cohorts.